Data mining source code for locating software bugs: A case study in telecommunication industry
نویسندگان
چکیده
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze finegrained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naïve Bayes model, (ii) 3% of the code using CGBR framework. 2008 Elsevier Ltd. All rights reserved.
منابع مشابه
Checking Language Dependent Accuracy of Web Applications using Data Mining Techniques
Over the last decade web applications are becoming very popular. These are becoming more users oriented now days. Various languages used for the development of a web application like PHP, Java, ASP.NET etc. Development of a web application is not done by individual; it is a result of team’s efforts. Different type of bugs and errors are present in source code. Finding out these bugs or errors i...
متن کاملLocating Matching Method Calls by Mining Revision History Data
Developing an appropriate fix for a software bug often requires a detailed examination of the code as well as generation of appropriate test cases. However, certain categories of bugs are usually easy to fix. In this paper we focus on bugs that can be corrected with a one-line code change. As it turns out, one-line source code changes very often represent bug fixes. Moreover, a significant frac...
متن کاملA family of code coverage-based heuristics for effective fault localization
0164-1212/$ see front matter 2009 Elsevier Inc. A doi:10.1016/j.jss.2009.09.037 q This research was supported by the MKE (Minis Korea, under the ITRC (Information Technology Resea supervised by the NIPA (National IT Industry Prom (C1090-0902-0032)). * Corresponding author. Tel.: +1 972 883 6619; fax E-mail addresses: [email protected] (W. Eric Wo (V. Debroy), [email protected] (B. Choi). 1 In ...
متن کاملStatic Detection of API Error-Handling Bugs via Mining Source Code
Incorrect handling of errors incurred after API invocations (in short, API errors) can lead to security and robustness problems, two primary threats to software reliability. Correct handling of API errors can be specified as formal specifications, verifiable by static checkers, to ensure dependable computing. But API error specifications are often unavailable or imprecise, and cannot be inferre...
متن کاملLearning Unified Features from Natural and Programming Languages for Locating Buggy Source Code
Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies treated the source code as natural language by representing both the bug report and source code based on bag-of-words feature repr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 36 شماره
صفحات -
تاریخ انتشار 2009